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Test  statistics  G and  ii  for  two  outliers  due  to  liirphy  and 
Grubbs  are  compared  with  a test  statistic  R based  on  applying  a s ingle - 
outlier  test  to  the  sample  v/ith  the  most  extreme  observation  removed . 

It  is  shown  that  G is  generally  most  effective.  The  G and  R pro- 
cedures are  shown  to  be  equivalent  when  the  number  of  outliers  is  un- 
knoviR,  and  to  give  satisfactory  performance  for  one  or  two  outliers. 

An  extension  to  the  linear  hypothesis  is  outlined. 
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(1936)  highlighted  a severe  problem  associated  with  the  use  of  this 
statistic  — tlie  "masking  effect. •'  This  effect  occurs  when  in  fact  nore 
than  one  outlier  is  present.  The  additional  outliers  mask  each  others' 
presence  by  so  inflating  the  s ancle  variance  as  to  reduce  the  power  ox 
the  test  severely  — in  some  cases  driving  it  to  zero  as  the  outliers 
rove  further  to  the  right. 

This  tasking  effect  has  led  to  the  development  of  several  outlier 
test  statistics  aimed  at  testing  specifically  for  the  presence  of  lOl 


outliers  in  the  sample  (f 


IS  SC 


lirphy  1951. 


Fergusen  1361.  Dixon 


1950) . These  K- outlier  statistics  arc  competitors  to  the  recursive 
procedure:  Compute  the  single-outlier  statistic  for  the  full  sample, 

and  for  each  of  the  subsamples  obtained  by  deleting  the  1 2,3,...,'< 
most  extrene  observations,  and  decide  on  the  basis  of  some  rule  using 
this  sequence  whether  outliers  are  present,  and  how  many  and  which  obser- 
vations they  are. 

The  object  of  this  paper  is  to  propose  a modificacion  to  tnis  recur- 


sive procedure,  arid  to  compare  its  performance  with  that  of  the  Grubbs 


1 


and  Murphy  statistics. 


Notation 


Let  be  n observations  believed  to  cone  from  a normal 

2 

distribution  N(£,o  )y  o unknown,  and  let  Y-,  > Yd  > ...  > Yn  be 

the  corresponding  order  statistics.  We  will  also  suppose  that  there 

is  at  hand  some  additional  information  on  a * in  the  form  of  an  indepen- 
2 2 

dent  a xv  variate  VI.  If  there  is  no  such  V/,  then  set  W = v = 0. 


n 


Define 


Y=  l Y./n 
i=l 


Yx  = l Y./(n-l) 
i=2 

*12  “ X V<n‘2> 


= W + l (Y,  - 7) 
i-1 


2 


si  - * X (yi  - V 


x=2 

n 


312 


w + I or.  - Yi  o) 
i=3  1 


Initially  make  the  somewhat  artificial  asst  ration  that  it  is  ’mown 
a priori  that,  if  outliers  are  nresent,  there  are  exactly  two,  one  a 

7 2 

N(£  + Xa,  a ) and  the  other  a I !(^  + a ) variable  with  X,<5  > 0. 
Tlte  two-outlier  test  statistics  to  be  considered  are  G = Sp/S, 


ii  = 0^  + Y2  - 27) //S  and  R = (Y2  - YO/Sj 


4 


The  statistic  G is  a slight  generalisation  of  one  proposed  by 
Grubbs  (1950)  (in  the  original  definition . V/  = v = 0 ) . It  seems  to 
liave  no  proven  optimality  properties,  but  corresponds  to  the  F test 


of  the  /nova  model  for  comparing  the  three  samples'  ’ Y1  , Y0  and 
{Y3, . . . ,Y  }.  For  this  reason,  it  seems  likely  to  be  effective  wnen 
it  is  not  known  a priori  that  X = 6. 

I-Sirp’iy  (1951)  defined  the  statistic  VI  for  the  case  W = v = 0, 
and  proved  its  optimality  when  it  is  known  a priori  that  X = 6.  Vie 
note  that  this  statistic  is  equivalent  to  the  /nova  test  for  comparing 
the  two  ’'samples"  {Y,/^}  and  (Yy . . . .Y^} , which  in  turn  is  equiva- 
lent to  the  two- sample  t statistic.  Of  course  neither  G nor  M 
has  a distribution  which  can  be  derived  from  the  conventional  /nova 
distributions,  since  the  "samples'’  are  obtained  by  ranking  the  . 

Both  statistics  are  extensions  of  the  single-outlier  statistic 

3 = (Y^  - Y)//S  to  the  two-outlier  situation.  For  the  single-outlier 

2 

case,  Grubbs  proposed  the  statistic  5-,/S  = (1  - nB  /(n-1)}. 

The  recursive  statistic  R is  clearly  not  optimal  since  it  does 
not  use  the  sufficient  statistics  Y^  , Y-  , Y -j  ; and  S12  > hut  is 
a natural  one  to  consider  since,  if  Y1  is  an  outlier,  it  corresponds 
to  the  Paulson  ontinal  statistic  for  tr.e  subsample  {Y2.Yj,  . . . ,Yn). 

The  three  test  statistics  are  related,  in  that  G and  i:  may  be 
expressed  as  functions  of  B and  R.  The  relationships  ray  be  estab- 
lished by  use  of  some  algebraic  identities  from  Quesenberry  and  David 
(1961)  and  are 
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G = {1  - nB2/(n-l)Hl  - (n-l)R2/(n-2)> 


11  = (n-2)B/(n-l)  + R{1  - ni:7(n-l)P  . (1) 

7ho  null  joint  distribution  of  R and  E is  given  iii  Hawkins 
(1373),  wnere  it  is  shown  that  if  F ( = ) denotes  the  cumulative  dis- 
tribution  f’inction  of  B based  on  a sai  O.e  of  size  n,  then  the  proba- 
bility 

?r[(b  < B < b+db)  n (x<f><B)j  - nf(b)[Fn_1<v{t(b)}  - Fn_ljV(x)]db  + o(db)  (2) 


where 


tCo)  = nb/[(n-l){l  - nb"7(n-l)P] 
f(b)  = {n/Cn-DTr^U  - nb2/(n-l)P(n+n‘4)  . 


Tlte  range  is  0 $ x < t(b)  < { (n-l)/u}  2 , the  constraints  R<B  and 
x < t(b)  being  iatposed  by  the  condition  > Y7  . 

The  Grubbs  test  lias  critical  region  G < Gg  , whose  size  is 

?r[G  < Gq]  = Pr[{l  - nB2/(n-l)}{l  - (n-l)R2/(n-2) } < CQ] 

= Pr[{l  - nB2/ (n-1) ) < G,{1  - (n-l)R2/(n-2)  T1]  . 


For  any  „ ve  ray  integrate  (2)  over  the  region  defined  by  (3) , 
and  so  get  this  probability.  In  this  way,  the  distribution  function 
of  G nay  be  found  and  fractiles  deduced.  For  the  case  W = v = 0, 
fractiles  were  found  by  Grubbs,  who  fo’ind  the  density  of  G directly, 
though  In  terns  of  a multiple  integral  over  a complicated  region. 

Tlie  ihrphy  test  has  critical  region  ’ ’ > g > whose  size  is 

Pr(;i  > ,;n]  = Pr[(n-2)B/(n-l)  + R{1  - nB2/(n-l))li  > 1L]  (4) 


* 


L 
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and  so  the  null  distribution  of  I:  r ay  be  found  in  the  same  way. 

Finally,  the  size  of  the  region  ? > Rg  may  be  found  from  (2), 
and  its  fractiles  deduced. 

Some  fractiles  of  G,  Ii  and  R are  listed  in  Table  1.  Those 
fractiles  listed  both  here  and  In  Grubbs  (1950)  agree.  So  far  as  this 
author  is  av/are,  this  is  the  first  table  of  exact  fractiles  of  H and  R. 

Figure  1 shows  the  critical  regions  of  each  of  these  tests  for  » 
the  case  n = 10,  v = C;  a = .05.  Recalling  that  a large  value  of 
B suggests  that  is  aberrant,  while  a large  value  of  R indicates 
that  is  aberrant,  we  can  interpret  these  regions  as  follows: 


i)  Each  test  is  best  in  some  region  of  (3,R)  in  the  sense 
of  rejecting  the  null  hypothesis,  while  both  other  tests 
accent  it. 

ii)  If  Y^  -*■  ~ while  (¥3  ,Yg, . . . .Yri}  remain  constant,  then 
ultimately  (3  .R}  falls  in  the  acceptance  region  of  the 
test  based  on  li,  anc  the  critical  region  of  that  based  on 
G.  This  nears  that  G tends  to  be  significant  when  in  fact 
only  a single  outlier  is  present.  This  nay  be  a shortcoming 
in  the  use  of  G.  The  face  that  M nay  become  non-signifi- 
cant when  becomes  more  aberrant  is  interesting,  and 
possibly  surprising.  In  fact,  if  we  let  Yj  -*■  {Y2 ,Yj, . . . ,Yn> 

remaining  fixed,  it  can  be  shown  quite  easily  that  ii  -*■  C = 
(n-2)/{n(n-l) } t . Thus  if  the  fractile  g > C,  then  ii 
will  ultimately  become  non- significant.  This  means  that 
i suffers  from  a type  of  masking  effect,  in  tliat  if  iig  > C , 


it  is  not  effective  for  finding  two  outliers  whose  means 
are  markedly  dissimilar. 

iii)  3y  contrast,  the  critical  region  of  R is  specific  for  two 
outliers,  and  is  not  affected  by  the  relative  magnitude  of 
B and  R. 

Performance  of  the  tests. 

In  this  section , neasures  of  the  performance  of  the  three  tests 
in  locating  exactly  two  outliers  are  considered.  The  best  criterion 
would  be  the  probability  of  a correct  decision  — that  is  tne  probability 
that  t te  two  contaminants  are  the  first  two  order  statistics  of  the 
sainple,  and  that  (?> , It)  falls  in  the  critical  region.  Tne  evaluation 
of  this  probability  however  seems  to  involve  sene  rather  difficult  dis- 
tributional problems,  faniuiar  from  the  single-outlier  case  (David 
and  Paulson  1965) , and  so  an  approximation  similar  to  that  of  David 
and  Paulson  is  proposed. 

As  a preli  unary,  suppose  that 

t1  ~ w(e,i) 

T2  ~ b(y,l) 

V ~ Xu 

it 

all  three  terns  being  mutually  independent.  Let 


U - T^Y1 

Z = T2/{V(1+U2)}h  . 


nation  methods  then  shew  that  the  joint  density  of 


rL?j(N+i+n+2)  j (y/2)xz 
i!(l+z2)ii(-N+i+ri+2) 


n=0  r!  (1+u  ) x 

: exp  -%CB2  + Y“)/(4irr )- 
this  Pr[(U>b)  n {Z  > a(b)}] 


v/hers 


(hCN+n+1),  h(i+l) }dv  (5) 


where  I (a,b)  demotes  the  incomplete  beta  ratio,  and  the  variable 


Returning  to  the  problem 


:he  He Inert  orthogonal  transformation,  definin. 


1 = (n/(n-l)}^(X1  - ; 

2 = ( (n-1)/ (n-2) }^(X. 
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where  3 = {n/(n-l)P5  - {n(n-l)}'\ 

. . Y Car 2)y.(n-rl) }?A . - - - 

n 7 

Letting  S = + £ (X.  - X)“ 

i=i  1 


we  identify  and  T2  with  the  T-  and  T,  cf  the  preceding  theory, 
V with  3 - if  - , and  N with  n+v-2. 

X w 

Then  U = T2/(3  - T2  - T2)Ji 


Z = T,/ (5  - lhh 

Jl  1 

Tj/S'1  = Z/(l  + Z2)?* 

T2/(S  - Tp?i  = U/  Cl  + u2)?i  . 

hov;  B’  = {(n-lVnl^/S^ 

R'  = {(n-2)/(n-l)>JsT2/(S  - T2)^ 

are  sinply  the  B and  R statistics  obtained  by  substituting  X-j 
for  and  X2  for  Y?  . 

Let  C be  t'ne  critical  region  for  any  particular  test,  and  let 
H(6,X)  = Pr[(B’ ,Pd)  c C]. 

Hiis  differs  fron  Pr({(B'sR!)  e C}  n {X^=Y, , X,=Y?}]  by  the  factor 

PrLCX^Yj)  0 (X2=Y2)|(3',R!)  e C].  (6) 


how  we  ray  argue,  as  in  Anscorhe  (1960)  that  if  6 > X » 0,  this  condi- 
tional probability  is  close  to  1.  /alternatively,  we  nay  note  as  in 


0 


MCXj-Yp  n (X2=Y2)|(B',R’)]  = 1. 


Bit’ier  reasoning  shows  that  th.e  'probability  of  a correct  decision: 


PrtCXj-Y^  n (X2=Y2)  n (B\R’)  £ C]  + PrfCVV  n (X^)  n (B',R»)  £ C] 


is  well- approximated  by  K(5,A)  + H(A,<5). 

Table  2 shows  the  values  of  H(6 ,A)  + I. (A, 6)  for  the  case  n = 10, 
v = 0,  a = . 05  whose  critical  regions  were  sketched  in  Figure  1. 

These  values  confirm  the  impressions  given  by  Figure  1.  If  6=A,  ii 
is  the  best  test  statistic,  but  its  power  deteriorates  if  <5  is  markedly 
different  fron  A.  Of  interest  is  the  good  performance  of  G;  however 
its  power  when  A=0  is  a warning  or  the  possibility  of  rejecting  Hq 
wien  in  fact  only  one  outlier  is  present.  The  recursive  test  is  never 
best,  but  is  never  poor  when,  in  fact,  two  outliers  are  present. 


Extension  to  an  unknown  number  of  outliers 


Suppose  now  that  it  is  not  known  a priori  whether  the  sample  contains 
0,  1 or  2 outliers.  A question  then  arises  of  now  to  adapt  the  test 
procedures. 

The  Grubbs  statistic  is  equivalent  to  an  Anova  for  the  three  groups 

Y, , Y,  and  {Y,„ . . . ,Y  I.  It  night  be  adapted  bv  setting  up  the  Anova 

for  comparing  Y,  v;ith  {Y, , . . . ,Y  > whose  test  statistic  is  equivalent 

to  S-j/S  “ (1  - nB“/(n-l)}.  H en  G,  is  further  partitioned  for  testing 

Y-,  against  {Y~ , . . . , Y } yielding  a test  statistic  equivalent  to 
l o n 

3^-,/Sj  = {1  - (n-l)R  /(n-2)}.  The  decision  as  to  the  number  of  outliers 
would  then  be  equivalent  to  a sequential  test  based  on  B and  R. 
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The  recursive  test  which  has  received  most  attention  C id  lillan 
1971,  Hawkins  1973)  is: 

i)  Confute  B.  If  this  is  non- significant,  conclude  that  there 
are  no  outliers. 

ii)  Otherwise  compute  R-  and  depend ins  on  its  significance,  conclude 
that  there  are  one,  or  two  outliers. 

It  is  now  well-Icnown  that  this  procedure  is  defective  because  of  the 
masking  effect.  liowever  this  type  of  cro'olen  is  very  familiar  in  other 
contexts  — for  exam, ole  deciding  by  what  order  polynomial  to  detrend 
a time  series  (Anderson  1971) , and  may  be  resolved  very  simply  by  reversing 
the  order  of  testing.  This  yields  the  procedure: 

i)  Compute  If  it  is  significant  conclude  that  there  are 
two  outliers. 

ii)  Otherwise . compute  B;  and  depending  on  its  significance, 
conclude  that  there  are  one,  or  no  outliers. 

The  fractilcs  needed  for  this  procedure  nay  be  coirputed  by  integrating 
(2)  over  the  two  regions:  C.  . R>Rg  and  C-  : 3>3q  . The  test 

procedure  is  not  fully  defined  until  one  maxes  a decision  as  to  how 
to  partition  a,  the  overall  probability  of  type  I error.,  between  these 
two  regions.  One  reasonable  partition  is  to  let  each  have  size  a/2. 

A sample  calculation  for  the  case  n = 10,  v = 0,  a = .05  yields 
the  fractiles  Rq  = 0.7484;  2^  = 0.759b.  Let  hh  (<S,A)  denote 
Pr[(B!,R!)  e C-3  i = 1,2  when  in  fact  outliers  are  present,  then 
-I- (6, A)  + n- (A , 5)  approximates  the  probability  cf  identifying  two  outliers 
(i=2)  or  one  outlier  (i=l) . 


Table  3 lists  these  approximations  for  this  sample  case,  and  shows 
that  the  performance  of  the  recursive  procedure  is  indeed  satisfactory. 

The  recursive  procedure  extends  in  a natural  way  to  test  for  the 
possibility  oc  12,...,K  outliers.  Hie  sequence  of  recursive  statistics 
omitting  the  most  aberrant  0,1,2, ... ,X-1  observations  is  computed, 
and  tested  Cor  significance  in  reverse  order.  ho  fractiles  for  the 
case  k>2  are  as  yet  available,  though  it  is  likely  that  the  approach 
used  in  Hawkins  (1973)  could  be  extended.  It  will  be  sho'vn  below  that 
conservative  fractiles  ray  be  obtained  by  the  use  of  the  Sonferroni 
inequality. 

Hue  iurphy  statistic  extends  in  an  obvious  way  to  accomodate  an 
unknown  number  of  outliers:  For  1:  = 1,2,..., 1C,  find  the  two-sample 

t statistic  for  comparing  {Y-, , . . . »Y,.}  with  (Y,  ......  ,Y  >.  If  the 

largest  of  these,  corresponding  to  k = kg  , say,  is  significant,  tnen 
conclude  that  outliers  are  present.  This  statistic,  in  the  case 
k=n,  corresponds  to  a test  statistic  used  in  Automatic  Interaction  Detection 
(see  for  example  Kass  1975,  Scott  and  Knott  1976). 

•to* /ever  since  Table  2 suggests  that  is  never  markedly  superior 
to  u,  and  may  in  fact  be  markedly  inferior,  no  attempt  has  been  made 
to  study  the  performance  cf  this  arocc.ro.  note  that  for  K=2  its 
fractiles  could  be  deduced  fren  the  basic  joint  distribution  (2). 

fxtenslcr!  to  the  linear  mj-kl 

Suppose  now  that  under  g the  are  generated  by  a linear 
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”i  ~ '‘fell.  " ] 

2 

where  jg  is  a p*l  vector  of  unknown  regression  coefficient  and  o 
is  unknown,  The  alternative  nodel  that  outliers  are  present  nay  be 


formulated  as 


xi  ~ NCwijg+Xi>  a“] 


where  A • = 0 for  the  good  observations.  The  sign  of  X-  nay , or 
nay  not  be  fixed  a priori,  depending  on  circumstances. 

The  nodel  7 nay  be  rewritten  in  terms  of  an  indicator  vector  as 
follows:  Let  w|  = (v^:  0 0. . .0,1,0. . .0) , the  1 occur ing  in  position 

i of  the  indicator  vector,  and  let 

£*'  = (B’>A1JA?,....Xn) 

Then  X±  ~ N[w|jg*,  a2].  (9) 

On  the  face  of  it,  there  seen  to  be  two  possible  approaches  to  the  problem 

of  handling  outliers  here.  One  proceeds  fcy  eliminating  outliers:  any 

significant  outlier  is  deleted  from  the  sample,  and  the  usual  estimates 
•** 

of  £ and  a in  7 computed  from  the  r training  data.  The  other  method 

? 

is  to  use  the  model  (9) , estimating  a"  and  those  A^  that  tests 
indicate  to  be  non- zero. 

In  fact,  the  two  approaches  are  identical  - the  matrix  operations 
involved  in  deleting  X.  from  the  sample  and  updating  the  regression 
(Beck, vir  and  Trussell  1975)  are  identical  to  those  involved  in  introducing 
A | to  the  regression. 

Given  this  equivalence  of  the  two  methods,  it  is  convenient  for 


present  purposes  to  work  in  terms  of  the  second  formulation. 
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In  matrix  form  the  model  is 


X ~ NlC  '.I  ) f?l  o2LJ- 


~ - v~  ~n'  [XJ - ~nJ 

In  solving  to  find  regression  coefficients,  form 


11/1'.'  V7«‘*  '.T'1 


Q!-  X:  I ) ' (J:  X:  I ) = X'  X X'X  X' 

r'^T\y  IS/  IS/;|/  ISI  »SI  /V  IV  IV 

V.r  X I . 

IM  /V  IS/  / 

Introducing  jd  into  the  regression  by  sweeping  on  the  first  p rows 
and  columns  transforms  this  matrix  to 


-Gi’H)'1 

Gi’iD'V’x 

UCi’JiD  1 

X'AX 

/W  /vrv 

„ -i 

Hp'liO 

AX 

r+j*+» 

- 1 2 

wnere  A = I - 'r(TMW)  W , so  that  S = X’AX  is  the  residual  sun 
of  squares  of  the  regression  of  X on  \!  and  e = AX  is  the  vector 

v/  /V  IV  r+j  rsjrst 

of  residuals. 

Let  a - ■ denote  the  i.jth  element  of  A.  and  e-  the  ith 
element  of  e.  Then  questions  about  tue  introduction  of  one  or  more 
X^  into  the  regression  focus  on  the  matrix 

fs2  e' 
e A . 


Tie  single-outlier  test  would  be  based  on  the  maximum  of  rv  — the 

‘"i 

martial  correlation  between  X and  indicator  variable  i.  This  is 


rX.  " ei/<aii^ 


a well-known  outlier  statistic  (Ellenberg  1S72).  In  the  case  = 1, 


p=l,  r,r  = n/(n-l)  so  this  statistic  generalizes  3. 

"i 

Unlike  the  case  p=l , the  case  p>l  poses  difficulties  for  testing 
for  the  presence  of  two  or  more  outliers:  — the  very  difficulties  that 
beset  the  repression  subset  problem . namely  that  the  best  subset  of 
size  k to  include  in  the  regression  does  not  necessarily  contain  the 
best  subset  of  size  k-1. 

If  the  number  of  outliers  k is  specified  a priori,  then  no  conceptual 
difficulty  arises.  rfhc  best  subset  of  k.  predictors  amongst  tne  A^  is 
found  by  using  a branch  and  bound  multiple  regression  algorithm.  Tne 
analysis  of  variance  then  yields  a test  statistic  equivalent  to  G. 

The  use  or  the  recursive  procedure  is  equivalent  to  a stepwise 
regression  in  which  one  introduces  K of  the  A^  in  order  of  signifi- 
cance, and  then  proceeds  to  eliminate  them.  As  in  stepwise  regression, 
a conservative  test  may  be  set  up  by  concluding  that  there  are  fewer 
than  k outliers  if  the  partial  correlation  corresponding  to  the  kth 
is  not  significant  at  the  a/{K(n-k+l)}  level.  Experience  with  multiple 
regression  shows  that  there  is  no  guarantee  that  this  procedure  will 
identify  the  correct  outliers,  especially  since  A has  rank  only  n-p, 
and  so  is  highly  multicollinear. 

For  tiie  case  vh  e 1,  p=i,  the  Bonferroni  bound  is  conservative  — 
for  example  in  the  test  case  considered  earlier,  tne  .05  fractile 


of  R corresponds  to  a Bonferroni  fractile  with  a — .1.  Considerable 


experience  is  still  needed  to  establish  whetiier  this  procedure  is  effective. 


I 
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